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Method for Generating and Processing Transition Streams 



CROSS REFERENCE TO RELATED APPLICATIONS 



This application is a continuation-in-part of U.S. patent application serial number 
09/347,213, filed July 2, 1999 for FRAME-ACCURATE SEAMLESS SPLICING OF 
INFORMATION STREAMS (attorney docket number 13235) which is incorporated herein 
by reference in its entirety. This application claims the benefit of U.S. provisional patent 
application serial number 60/129,275, filed April 14, 1999 and incorporated herein by 
reference in its entirety. 

The invention relates to communications systems generally and, more particularly, 
the invention relates to a method for splicing or concatenating information streams in a 
substantially seamless manner. 



In several communications systems the data to be transmitted is compressed so that 
the available bandwidth is used more efficiently. For example, the Moving Pictures Experts 
Group (MPEG) has promulgated several standards relating to digital data delivery systems. 
The first, known as MPEG-1 refers to ISO/IEC standards 1 1 172 and is incorporated herein 
by reference. The second, known as MPEG-2, refers to ISO/IEC standards 13818 and is 
incorporated herein by reference. A compressed digital video system is described in the 
Advanced Television Systems Committee (ATSC) digital television standard document 
A/53, and is incorporated herein by reference. 

It is important to television studios and other "consumers" of information streams to 
be able to concatenate or splice between information streams (e.g., transport encoded 
program streams incorporating video, audio and other associated information sub-streams) 
in a substantially seamless and frame accurate manner. "Frame accurate" means that a 
splice occurs precisely at the frames selected by the user, regardless of the frame type of the 
encoded frame (e.g., I- t P- or B-frame encoding). "Seamless splice" means a splice which 
results in a continuous, valid MPEG stream. Thus, a frame accurate seamless splicer will 
preserve an exact number of frames when performing a frame accurate seamless splice of a 
first information stream into a second information stream (e.g., a transport encoded program 



BACKGROU ND OF THE DISCLOSURE 



_ PCT/USOO/10208 

WO 00/62552 

-2- 

comprising a 900 video frame commercial presentation may be scheduled into a "slot" of 
exactly 900 frames). 

Several known methods utilize variations of the following procedure: decoding an 
"in stream" and an "out stream" to a baseband or elementary level, performing a splice 

5 operation and re-encoding the resulting spliced stream. These methods provide frame 
accurate seamless splices, but at great expense. 

In an improved method allowing seamless splicing at the transport stream level, 
MPEG and MPEG-like information streams including, e.g., video information may be 
spliced together in a relatively seamless manner by defining "in-points" and "out-points" for 

10 each stream that are indicative of, respectively, appropriate stream entry and exit points. 

For example, a packet containing a video sequence header in an MPEG-like video stream 
comprises an appropriate in-point. An MPEG-like information stream that contains such in- 
points and out-points is said to be spliceable. The Society of Motion Picture and Television 
Engineers (SMPTE) has proposed a standard SMPTE 312M defining such splicing points 

15 entitled "Splice Points for MPEG-2 Transport Streams." which is incorporated herein by 
reference in its entirety. 

Unfortunately, the placement of such In points and out-points is defined by factors 
such as image frame encoding mode, group of pictures (GOP) structure and the like. 
Therefore, an end user trying to seamlessly splice between information streams cannot do 

20 so in a "frame accurate" manner if the desired splicing points are not appropriate in-points 
or out-points. 

Therefore, it is seen to be desirable to provide a method and apparatus that allows 
seamless, frame accurate splicing of MPEG-like transport streams. Moreover, it is seen to 
be desirable to provide a method and apparatus for applying such a seamless, frame 
25 accurate splicing method and apparatus to the particular environment of a television studio 
or other video serving environment. 

ST IMMARY OF THE INVENTION 
The invention comprises a method for generating a transition stream and processing 
video, audio or other data within the transition stream using, respectively, pixel domain 
30 processing, audio domain processing or other data domain processing. Alternate 
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embodiments of the invention ensure that non-video data related to image frames forming a 
transition stream are included within the transition stream. Multiple and single program 
transport streams splicing operations are supported by the invention. 

Specifically, in a system for processing transport streams including image frames, a 
method according to the invention for generating a transition stream for transitioning from a 
first transport stream to a second transport stream in a substantially seamless manner 
comprises the steps of: decoding a portion of the first transport stream including at least a 
target out-frame representing a last image frame of the first transport stream to be 
presented; decoding a portion of the second transport stream including at least a target in- 
frame representing a first image frame of the second transport stream to be presented; 
processing, using a pixel domain process, at least one of the decoded image frames; and 
encoding a plurality of the decoded image frames, including the target out-frame and the 
target in-frame, to produce the transition stream. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The teachings of the present invention can be readily understood by considering the 
following detailed description in conjunction with the accompanying drawings, in which: 

FIG. 1 depicts a high level block diagram of a television studio; 

FIGS. 2A and 2B are graphical representations of a splicing operation useful in 
understanding the invention; 

FIG. 3 depicts an embodiment of a play to air server suitable for use in the television 
studio of FIG. 1; 

FIGS. 4 A, 4B and 4C are graphical representations of a splicing operation useful in 
understanding the invention; 

FIGS. 5 and 6 depict tabular representations of image frame display order and image 
frame transmission orders useful in understanding the invention; 

FIG. 7 depicts a flow diagram of a method for generating a transition stream or 
transition clip; 

FIG. 8 depicts a flow diagram of a method of determining which information frames 
within a from-stream should be included within the transition stream; 
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FIG. 9 depicts a flow diagram of a method for determining which information 
frames within a to-stream should be included within the transition stream; 

FIG. 1 0 depicts a flow diagram of a method for indexing an information stream; 
PXG. 1 1 depicts a tabular representation of a meta file suitable for use in the play to 
air server of FIG. 3; and 

FIG. 1 2 depicts a flow diagram of a method for generating a transition stream or 
transition clip incorporating pixel domain effects; and 

FIG. 1 3 depicts a flow diagram of a method for generating a transition stream or 
transition clip according to an embodiment of the invention. 

To facilitate understanding, identical reference numerals have been used, where 
possible, to designate identical elements that are common to the figures. 

DETAILED nFSCRIPTION 
After considering the following description, those skilled in the art will clearly 
realize that the teachings of the invention can be readily utilized in any information 
processing system in which a need exists to perform seamless, frame accurate splicing of, 
e.g.. MPEG-like transport streams including video sub-streams. 

An embodiment of the invention will be described within the context of a television 
studio environment where a play to air controller causes stored video streams (e.g., video 
segments or "clips") to be retrieved from a server and spliced together in a seamless, frame 
accurate manner to produce, e.g., an MPEG-2 compliant video stream suitable for 
transporting to a far end decoder. However, since the scope and teachings of the invention 
have much broader applicability, the invention should not be construed as being limited to 
the disclosed embodiments. For example, the invention has applicability to server-based 
asset streaming for cable headends, insertion of local commercials and trailers for digital 
cinema, frame accurate Internet-based streaming of MPEG-2 transport streams and limited 
production facilities (i.e., those production facilities performing only the composition of 
segments for news or other applications). 

Throughout this description various terms are used to describe the invention. Unless 
modified by the following description, the several of the terms are defined as follows: A 
spliced stream comprises a stream formed by concatenating an exit-stream (or from-stream) 
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to an entry-stream (or to-stream) at a particular splicing point. An exit-frame is the last 
frame of an exit-stream. An entry frame is the first frame of an entry-stream. 

FIG. 1 depicts a high level block diagram of a television studio. Specifically, the 
studio of FIG. 1 comprises a play to air server 1 10, a mass storage device 1 15, a play to air 
controller 120, a router 130 and a network interface device (NID) 140. 

The mass storage device 1 15 is used to store a plurality of, illustratively, MPEG-2 
transport streams including encoded video sub-streams and associated audio streams 
providing a program. The mass storage device 115 may also be used other types of 
information streams, such as packetized or non-packetized elementary streams comprising 
video data, audio data, program information and other data. 

The play to air server 1 10 retrieves, via signal path SI, information streams from the 
mass storage device 115. The retrieved information streams are processed, in response to a 
control signal produced by the play to air controller 120 (e.g., a play list) to produce an 
output transport stream comprising a plurality of concatenated transport streams. The play 
to air server 1 10 provides the output transport stream and is coupled to the router 130 via 
signal path S2. 

The play to air controller 120 provides control information to the play to air server 
1 10 and other studio equipment (not shown) via a signal path S3, which is coupled to the 
router 130. The router 130 is used to route all control and program infoimation between the 
various functional elements of the television studio 100. For example, control information 
is passed from the play to air controller 120 via signal path S3 to the router 130, which then 
pzss^s the control information to the play to air server 1 10 via signal path S2. Optionally, a 
direct control connection CONTOL between the play to air controller 120 and the play to 
air server 1 10 is used for passing control information. 

The router 130 receives the output transport stream from the play to air server 1 1 0 
via signal path S2 and responsively passes output transport stream to other studio 
components (e.g., editors, off-line storage elements and the like) via signal path S5, or to 
the network interface device 140 via signal path S6. 

The network interface device (NID) 140 is used to communicate the output transport 
stream, control information or any other information between the television studio 100 of 
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FIG. 1 and other studios (not shown). Optionally, the NID receives information streams 
from other studios, remote camera crews, broadcasters and the like. These streams are 
coupled to the play to air server 1 10 for immediate processing into an output transport 
stream being produced (e.g., "live" coverage of a news event), for delayed processing or for 
storage in the mass storage device (with or without processing). 

The play to air server 1 1 0 and mass storage device 1 1 5 may be implemented using a 
compressed bitstream video server such as the Origin 2000 "Play-To-Air /Production 
Server" manufactured by SGI of Mountain View, California. 

The play to air controller 120 comprises a play list 125 corresponding to the 
information streams or clips that are to be scheduled for subsequent incorporation into the 
output transport stream of the play to air server 1 10. The play list 125 includes exact frame 
entry and exit locations of each of the information streams or clips that are to be retrieved 
from the mass storage device 1 15 and concatenated or spliced into the output transport 
stream by the play to air server. The play list 125 may also identify the first and last frames 
for each of the information streams or clips. 

The play to air server 1 10, in response to a control signal from the play to air 
controller providing at least portions of the play list 125, retrieves the appropriate streams or 
clips from the mass storage device and splices the clips in a seamless, frame accurate 
manner according to the frame entry and exit information within the control signal to 
produce the output transport stream. Importantly, the output transport stream produced has 
no syntax errors or discontinuities to any other studio component, including any remote 
feeds provided by the Network Interface Devices 140. The splicing or concatenation 
operations performed by the play to air server will be explained in more detail below with 
respect to FIG. 2A and FIG. 2B. 

FIG. 2A and FIG. 2B are graphical representations of a splicing operation useful in 
understanding the invention. Specifically, FIG. 2A graphically depicts a frame accurate, 
seamless splicing operation of two 30 frames per second MPEG-2 transport stream clips 
(210, 220) using a transition clip (230) to produce a resulting spliced 30 frames per second 
MPEG-2 transport stream clip (240). The transition stream 230 is formed using portions of 
the first stream 210 and the second stream 220. The resulting spliced stream 240 comprises 
the concatenation of portions of the first 210, transition 230 and second 220 streams. The 



WO 00/62552 PCT/US00/10208 

resulting spliced stream 240 comprises a "knife edge" or frame accurate splice between the 
first and second streams at an out-point (2 10-OUT) of the first stream 2 1 0 and an in-point 
(220-IN) of the second stream 220. 

FIG. 2B depicts various SMPTE timecodes associated with the streams or clips 
depicted in FIG. 2A. The first stream or clip 210 (STREAM A) comprises a plurality of 
frames including a first frame 21 0-ST beginning at a time to, illustratively at a respective 
SMPTE timecode of 00:00:00:00; a transition out frame 2 1 0-TRANS beginning at time t,, 
an out-frame 2 10-OUT ending at a time t 2 , illustratively at a respective SMPTE timecode of 
00:00:02: 13; and a last frame 210-END starting at a time greater than time t 2 . 

The out-frame 2 10-OUT comprises the last frame of the first stream 210 to be 
displayed (i.e., the frame immediately preceding the desired splice point). The out- frame 
2 10-OUT will be included within the transition stream 230. The transition out frame 
210-TRANS comprises the last frame of the first stream 210 to be transmitted. That is, the 
transition stream 230 will be concatenated to the first stream 210 immediately after the 
transition out frame 210-TRANS. 

The second stream or clip 220 (STREAM B) comprises a plurality of frames 
including a first frame 220-ST beginning at a respective SMPTE timecode of 00:00:00:00; 
an in- frame 220-IN beginning at time t 2 , illustratively at a respective SMPTE timecode of 
00:00:00:23; a transition in frame 220-TRANS beginning at time tj and a last frame 
210-END ending at a time t,, illustratively a respective SMPTE timecode of 00:00:04:1 7. 

The in-frame 220-IN comprises the first frame of the second stream 220 to be 
displayed (i.e., the frame immediately following the desired splice point). The in-frame 
220-IN will be included within the transition stream 230. The transition in frame 
210-TRANS comprises the first frame of the second stream 220 to be transmitted. That is, 
the transition in frame 220-TRANS will be the first frame of the second stream 210 
concatenated to the transition stream 230. 

The transition stream or clip 230 (STREAM T) is a data structure well adapted to 
providing seamless, frame accurate splicing of video streams. The transition stream or clip 
230 (STREAM T) comprises a plurality of frames including a first frame 230-ST beginning 
at a time t,; and a last frame 230-END ending at time t 3 . The transition clip is comprises 
frames from both the first stream 210 and the second stream 220, including the respective 
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in . and cu,.frames. The beginning and end of the transition clip is depicted in ^ 
^ective.y.timcand,,. ..mustbeno.ed.ha, these — - «- «- «- » d - 
^ of the ^.ion stream wU. be dennined according to methods *« w.U be 
described below with respect to FIGS. 8 and 9. 

.• , un,™™ii!o a nlurality of frames including a first 

The resulting spliced stream 240 comprises a piuram, 

frame 240-ST beginning a. time „, ■Uus.rative.y a respecriv. SMFTE umecode o 
00:00:00:00; and a Us, frame 240-END ending a, time ,. iHustrat.ve.y »*< -SMPTE 
rimecode of 00:00:04,7. Tbe spticed stteam 240 comprises 73 frames from the tat cbp 
2.0 (i.e., .. mrougb .a) and .15 6»« from .be second clip 220 (i.e., „ through U). 

The spUce sueam 240 depicted in FIG. 2A comprises the firs, 2.0 and second ^220 
™ concaved in a manner (using me »ansition s«am 230) to 
spUce (splice aream 240 nmecode 00:00:02: .3) where me frrs. —.2.0 - appamntiy 
Led a, me ou. frame 2,043UT and me second stream 220 is apparent* 
fan,. 220-IN. Utilizing .hepmse* invention, mis splicing operation occurs ma frame 
accural manner, regardless of me frame type of*, ou. (exit) and in (entry) frames. 

,, should be noted that under ideal splicing conditions (discussed in the SMPTE 
3.2M splicing aandard) i. is possib.e ma, no transition ctip is ^ " 
roosl conditions, me nation dip wiU conuin multip.e frames ramer man me empty 
fame" mmsition clip tha, may be generated under the ideal condioons. 

FIG depicts an em,„dim«n,ofap,ay,o air server sui,ab,e for use in me television 
audio ofFIG. .. Specificatiy. ,h. exemp,ary play to air server HO of FIG. 3 corpses an 
mpub-ompu, (I/O) cireui, 3,0. support circuitry 330. a processor 320. a memory 340 »d» 
optionalle base corrector 350. The processor 320 cooperates with convention., support 
cLitiy 3.0 such as power supplies. Coca circuit cache memory and me .tire as we., as 
circuit ma, assis, in executing me various software routines within me play ,o «r server 
„0 Thep.ay«oairserver.l0a l5 omc.udesinpu^u,ci re ui O y310ma,formsan 

Lface between the p,ay to air server . .0 and the mass storage device 1,5 and router .30. 

The memory 340 includes programs and other information suitable for 
cementing m. invention. SpecmcaUy, me memory 340 is used ,o store P~*™^ 
when executed by the processor 320. perform an index generation ftmcon 
dip generation function 344 and. optionaHy, a transition dip time resumpmg function 345. 
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Optionally, the memory 340 includes one or both of an index library 346 and a stream 
library 348. 

To provide a splicing operation such as described above with respect to FIGS. 2A 
and 2B, the invention utilizes the transition clip generation function 344. The transition clip 
generation function 344 generates a transition clip, such that it is possible to exit the first 
stream 2 1 0 at a first prescribed Transport Packet boundary (determined by, e.g., the 
transition stream generator), run the generated transition clip 230, and then enter the second 
stream 220 at a second prescribed Transport Packet boundary. The actual exit 
(210-TRANS) and entry (220-TRANS) points to the first 2 1 0 and second 220 stream will 
typically not correspond to the actual frames that were requested. Rather, the transition clip 
will be constructed using some number of frames immediately before the splice required 
exit point 210-OUT of the first stream 210, and some number of frames immediately after 
the splice required entry point 220-IN of the second stream 220. 

The invention selects frames to be included in the transition stream in a manner that, 
preferably, optimizes the quality of the inter-stream transitions. That is, even though a 
splicing operation is performed in a frame accurate and seamless manner, it is possible for 
the splicing operation to result in qualitative degradation of video information near the 
splicing points. This is caused by "bit starving" or other coding anomalies resulting from, 
e.g., mismatched video buffering verifier (VB V) levels. The invention adapts the VBV 
20 levels to minimize such anomalies. 

The index generation function 342 will now be described in detail. Two types of 
information are used to build a transition clip, frame data and MPEG data. Frame data 
comprises information such as the location, coding type and presentation order of particular 
frames in the from- and to-streams. Frame data is used to determine which frames within 
the from-stream and the to-stream are to be recoded to produce the transition clip. MPEG 
data comprises information such as frame dimensions, bit rate, frame versus field formats, 
video buffering verifier (VBV) delay, chrominance sampling formats and the like. MPEG 
data is used to specify the MPEG encoding characteristics of the transport stream. The 
transition clip is preferably encoded or recoded using the same MPEG parameters as the 
30 input TS. 
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To assis, in .he genemuon of transition c.ip(s) by *. .ransition dp g ««*ion 

indox generation taction 342 is used .o processes .ach of*, transport streams to be 

2 determine sever, parameters associa,ed whh each frame * 1— « 
Lms. The defined papers are s,„r«d in a meia f„e. sucb « each ^on 
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346 

!n the exemplary embodiment, the index generation function 342 determines, for 
each respective video frame in a transport encoded video stream, the followmg: 

1) the current picture number (in display order); 

2) picture coding type (K P- or B-frame); 

3) the number of the transport packet containing the start of the frame; 

4) the number of the transport packet containing the end of the frame; 

5) the presentation time stamp (PTS) of the frame; 

6) the decode time stamp (DTS) of the frame; 

7) the number of the transport packet containing the start of the sequence header 
20 preceding the frame; 

8) th e number of *e transport packet containing the s.art of*, picture header precede 

the frame; and 

9) any indicia of*, frame comprising an appropriate in frame or out frame, such as 
provided by fram. markings according to «h. SMPTE 3!2M spiicing syntax. 

b addition to the per-fram. data. *e index gen.ra.ion function 342 optio^.y saves 
,„ fi..ds for common MPEG-2 structures such as seouence headers, ptcture headers and the 
like. 

Thus, the stream .ibrary 348 (or mass storage d.vic. , , 5) comprises transport 
strums ma. have been processed according to Ore index generation function 34.. An 
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embodiment of the index generation function 342 will be described below with respect to 
FIG. 10. 

Since parsing a transport stream can be time consuming, one embodiment of the 
invention utilizes pre-indexing. That is, transport streams stored within the mass storage 
device 1 15 or stream library 348 are processed by the index generation function 342 at the 
time of storage or as soon as possible thereafter. In this manner the time required to build 
transition clips is greatly reduced since there is no need to parse transport streams at the 
time of splicing to determine frame and MPEG parameters of the streams. In addition, the 
play to air server 1 10 optionally utilizes the meta files stored within the mass storage device 
1 15 or index library 346 to quickly retrieve characteristics of a transport stream that may be 
needed for scheduling and other functions, such as frame rate. 

FIG. 1 0 depicts a flow diagram of a method for indexing an information stream. 
Specifically, FIG. 10 depicts a flow diagram of a method 1000 suitable for use in the index 
generation function 342 of the play to air server 1 10 of FIG. 3. The method 1000 of 
FIG. 10 is suitable for use in implementing step 705 of the method 700 of FIG. 7. 

The method 1000 is entered at step 1005, when an information stream to be indexed 
is received. The method 1000 then proceeds to step 1010. 

At step 1 010 the transport layer of the information stream to be indexed is parsed. 
That is, the header portion of each transport packet within the information stream to be 
parsed is examined to identify a transport packet number (tr), the presence or absence of a 
sequence header within the transport packet, the presence or absence of a picture header 
within the transport packet, the presence or absence of a SMPTE 3 1 2M splicing syntax 
indication of a splicing in- frame or a splicing out- frame and other information. The method 
1 000 then proceeds to step 1015. 

At step 1015 the first or present frame is examined. That is, the information stream 
to be indexed is parsed down to the packetized elementary stream (PES) layer to examine 
the first video frame of the video elementary stream included within the information stream 
to be indexed. The method 1000 then proceeds to step 1020. 

At step 1020 various parameters associated with the frame examined in step 1015 
are determined. Specifically, referring FIG. 1020-D, step 1020 determines the current 
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picture number (in display order), Ore picture coding type d, P- or B- frame), the number of 
the transport packer containing rhe start of the frame, the number of the transport packer 
containing .he end of the frame and the presentation times sramp (PTS) and decode time 
sump (DTS) of the frame. As previously noted with respect to step 1010. the transport 
packet containing the start of the sequence header preceding the frame has been noted. *e 
number of the transport packet containing the surt of the picture header preceding the frame 
has been noted and any indicia of the frame comprising an appropriate m-fraroe or 
out-frame, such as provided by frame markings according to tit. SMPTE 312M sphemg 
syntax have been noted. Additionally, a, step 1020 the "CBd and Bd" are also determmed. 
The method 1000 then proceeds to step 1025. 

The quantity Bd is a buffer delay as marked in the stream. This is the amount of 
time me firs, bit of a picture remains in me VBV buffer. The quantity CBd is the calculated 
buffer delay. The indexer calculates this value as indicated in Annex C of the MPEG-2 
specification. The buffer delay Bd and calculated buffer CBd should match, but if the input 
s«am is improperly marked the two quantities may differ. The buffer delay value is used 

, . „ A - tBt VTW levels between 210trans and 220trans. 
by the invention to determine how to adjust the VBV leveis oeiwcc 

The VBV level adjustment is done in the transition clip. 

At step 1025 the information regarding the index information is stored in. e.g., the 
mass storage device 1 15 or the index library 346. The method 1000 then proceeds to step 
1030. 

At step 1 030 a query is made as to whether more frames are to be processed. If the 
query is answered negatively, then the method 1000 proceeds to step 1040 where it is 
exited If the query is answered affirmaUvely, then the method 1000 proceeds to step 1035 
where the next frame is queued, and to step 1015, where the next queued frame is 
examined. 

FIG 1 1 depicts a tabular representation of a meta file suitable for use in the index 
library 346 of FIG. 3. Specifically, the table 1 100 of FIG. 1 1 comprises a plurality of 
records ( 1 -54), each record being associated with a respective starting transport packet field 
1110 packetized elementary stream identification field 1 120, frame and frame type 
identification field 1 130, PTS field 1 140. DTS field 1 150, B„ field 1 160. CB d 1 170 and 
marked splice point field 1 180. 
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In one embodiment of the invention, the index generation function 342 is not used 
prior to receiving and/or splicing transport streams. In this embodiment, frame selection is 
accomplished using a single-pass processing of at least a portion of each transport stream to 
be spliced to determine several parameters related to the from-stream and to-stream. 

For both the from-stream and the to-stream, the following parameters are 
determined: transport packet offsets of the sequence_header and picture_header to begin 
decoding, the number of frames to decode; and the number of decoded frames to discard 
(e.g., anchor frame needed to decode frames to be included in the transition clip). 

For the from-stream only, the following parameters are determined: the last transport 
packet to play from the from-stream (i.e., the new exit point or exit frame); and the PTS of 
first frame to display in the transition clip. 

For the to-stream only, the following parameters are determined: the starting and ending 
transport packets for the I-frame to copy to the transition clip; the starting and ending 
transport packets for remaining GOP to copy to the transition clip; the first transport packet 
to play from the to-stream (i.e., the new entry point or entry frame); and the number of 
frames to be copied. 

In addition, since the indexing library retrieves MPEG fields as it parses a transport 
stream, all required recoding parameters are also saved during frame selection. 

The transition clip generation function 344 will now be described in detail. The 
process of constructing a transition clip comprises the steps of 1) determining which frames 
to include in the transition clip; 2) decoding the frame to be included in the transition clip; 
3) encoding or recoding the frames forming the transition clip and 4) transport encoding 
(i.e., packetizing) the transition clip. 

Frame selection affects the size of the output transition clip, the amount of time 
required to generate the transition and places constraints on the encoder in terms of 
optimizing the quality of the recoded video. The frame selection method discussed herein 
resolves the issues of frame dependencies while reducing the frame count and still allowing 
enough transition time to recode the video without significant loss of quality. 

The encoding or recoding step is typically the most time consuming step in the 
transition clip generation function 344, so reducing the number of frames to recode provides 
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^ saving, However, since one of the primary seasons for bui.ding a transmon chp ts ,o 
.econei.ediffereneesioV^V.eveisbe.weenn.e.wo^onsn.amsbe.ngsphced. 

while adjusting ft. VBV .eve. (especiaUy when decreasing iU since frames must be 

buffer man are taken out This requires the encoder to use fewer b.,s per p.cture (on 
average). 

ta. IsnUssion order usefu. in understand -be inveniion. Sp.cif.ca.,, HO. 5 depots 
lur representation 5 . 0 depicting ibe display order of. illusively, 24 encoded 

£ frames fling a portion of a video science and a second tabu ar repres— 

520 depicting .he mission order of ft. 24 image frames formmg me vdeo serene. 

^ pu^oses of *is discussion, tire video sequence depiced in FIG. 5 

of , ftom-suoam video sequence (i.e., me firs, display* sequence m a sphced science). 

such as described above with respect to the first shears. 210 of FIG. 2. 

Sp~ifical„ per the firs, tabular represent 510. th. image frames aredisplayed 
ipecinc y.p , Jrtn _ , GO P> structure as follows (ton frame 1 to 

and encoded according to a group of pictures (GOP) strucmr 

ftame 24): 

l-B-B-P-B-B-P-B-B-I-B-B-P-B-B-P-B-B-I-B-B-P-B-B. 

Additionally, per me second tabular representation 520. the image names are 

uansmitted in the following ftame order 
,^2-3-7.5^10-8-9-13-11-12-16-14-15.19-17.18-22-20-21-25-23. 

„ is assumed, for purposes of the fo.lowing discussion, ma. it is desired to exit the 
video sequence depicted in FIG. 5 at fiame .5. which comprises a B-fram. That ,s. ftame 
,5 comprises *. out-frame of the exit stream depicted in FIG. 5. As w > be ilx ^ 
below, ftames .0 through .5 wil, be decoded (in display order,. ,t should be no «h* 
frame 1 6 is the previous anchor ftame to frame 1 5 in transmission or e. Therein, 
necessary to decode ftame 1 6 prior to decoding frames 1 4 and > * " ~ ' 

The las, ftame in me fiom-c.ip prior to me transition dip will be frame 13. That ts. me 
from-clip will be exited immediately before frame 16. 
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FIG. 6 depicts a tabular representation of image frame display order and image 
frame transmission order useful in understanding the invention. Specifically, FIG. 6 depicts 
a first tabular representation 610 depicting the display order of, illustratively, 26 encoded 
image frames forming a portion of a video sequence and a second tabular representation 
620 depicting the transmission order of the 26 image frames forming the video sequence. 
For purposes of this discussion, the video sequence depicted in FIG. 6 comprises a portion 
of a to-stream video sequence (i.e., the second displayed sequence in a spliced sequence), 
such as described above with respect to the second stream 220 of FIG. 2. 

Specifically, per the first tabular representation 610, the image frames are displayed 
and encoded according to a group of pictures (GOP) structure as follows (from frame 1 to 
frame 26): 



I-B-B-P-B-B-P-B-B-I-B-B-P-B-B-P-B-B-I-B-B-P-B-B-I-B. 

Additionally, per the second tabular representation 520, the image frames are 
transmitted in the following frame order 

1 5 1 -4-2-3-7-5-6- 1 0-8-9- 13-11-12-16-14-15-19-17-1 8-22-20-2 1 -25-23-24-28. 

It is assumed, for purposes of the following discussion, that it is desired to enter the 
video sequence depicted in FIG. 6 at frame 15, which comprises a B-frame. That is, frame 
15 comprises the in-frame of the entry stream depicted in FIG 6. As will be discussed 
below, frames 1 0 through 1 8 will be decoded (in display order). It should be noted that the 
20 first frame to be displayed from the to-stream is frame 25 (an I-frame that is not included in 
the transition clip). 

FIG. 7 depicts a flow diagram of a method for generating a transition stream or 
transition clip. Specifically, FIG. 7 depicts a flow diagram of a method 700 suitable for use 
in the transition clip generation function 344 of the play to air server 1 10 of FIG. 3. 

The method 700 is entered at step 705, where a "from-stream" and "to-stream" are 
annotated. That is, the information stream providing the information prior to a splice point 
(the from-stream) and the information stream providing information subsequent to the 
splice point (the to-stream) are annotated to identify, on a frame-by- frame basis various 
frame parameters as described above with respect to the index generation function 342. A 
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m «hod for seating an information stream is described above with respee. to FIG. 10. 
The method 700 then proceeds to step 71 0. 

At step 710 a portion of the from-stream prior to the ext. frame is decoded. That is, 
a pmrality of information frames within the from— including me exit frame (t.e the 
I information frame within me from-srream to be displayed) are decoded. The method 
700 then proceeds to step 715. 

At step 71 5 a ponion of the to-sueam starting a, me entry frame is decoded. That is. 
information frames within the to-stieam beginning with the enny frame (i.e the firs, frame 
of the ,o-s,rcam to be displayed) are decoded. The memod 700 then proceeds ,o 750. 

A. step 720 me decoded portions of me from-stream and to-stream are re-encoded to 
produce a transition dip or transition stream. A transport stream inoludmg. e.g. vtdeo and 
audio information associated with the from-stream and to-stream. 

The transition taream or transition clip generated by the method 700 of FIG. 7 is 
used 3, a edition between the from-stream and the ,o-sn,am by, e.g.. the play to atr 
server 1 10 of FIGS. 1 and 3. 



Frame Selection. 



The first step in the process of constructing a transition clip or transition stream 
comprises the step of defining which frames to ine.ude in the transition dtp (t.e.. the 
frame selection process). 

FIG 8 depicts a flow diagram of a method of determining which information frames 
within a from™ should be included within the transition sneam. The method 800 of 
FIG. 8 is suitable for use in implementing step 710 of the memod 700 of FIG. 7. 

The method 800 is entered at step 805, where the exit frame of the from-stream is 
identified. The exi, frame of the from-stream is th. .as, frame within the "earn to be 
displayed prior to a sphce point. For example, teferring now to the from-smeam^ deptcted m 
FIG. 5 " the ex., frame (frame .5) comprises a B-frame denoted as frame 513. The memod 
800 then proceeds to step 810. 
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At step 8 10 the method 800 decodes, in display order, the exit frame and the 
immediately preceding non-anchor frames. That is, referring again to FIG. 5, the exit frame 
(frame 15) and the immediately preceding non-anchor frames (frames 1 1, 12, 13 and 14) are 
decoded. Since frames 1 1, 12 and 13 are predicted using frame 10, it is necessary to also 
decode frame 10. However, the decoded frame 10 may be discarded after frames 11-13 
have been decoded. That is, all frames from the I-frame preceding the exit frame in display 
order up to and including the exit frame are decoded. It is necessary to start from the 
I-frame because the I-frame has no frame dependencies (i.e., it can be decoded without first 
decoding any other frames). The method 800 then proceeds to step 815. 

At step 815a query is made as to whether the exit frame is a B-frame. If the query 
at step 815 is answered negatively, then the method proceeds to step 820. If the query at 
step 815 is answered affirmatively, then the method 800 proceeds to step 825. 

At step 820, since the exit frame is either an I-frame a P-frame, the last from-stream 
frame to be displayed (i.e., the transition frame) prior to the transition stream frames is the 
frame immediately preceding, in transmission order, the exit frame. That is, if frame 15 of 
the from-stream depicted in FIG. 5 was a P-frame or I-frame rather than B-frame, then the 
last from-stream frame to be displayed would be frame 14. If the exit frame is an I- or 
P-frame, frame dependencies and reordering make it possible to leave the transport 
immediately before the next anchor frame (i.e., after all B-frames that are dependent on the 
exit frame). While this reduces the number of frames to recode, it also reduces the 
opportunity to adjust VBV levels for the transition. The method 800 then proceeds to step 
830. 

At step 825 if the exit frame is a B-frame (such as the exit frame in the from-stream 
depicted in FIG. 5), then the last from-stream frame to be displayed is the frame 
immediately preceding, in transmission order, the preceding anchor frame. Referring now 
to FIG. 5, the preceding anchor frame with respect to the exit frame is a P-frame (frame 13). 
It should be noted that the last frame to be transmitted of the 24 frame sequence depicted in 
FIG. 5 is the B-frame 12, while the last frame to be displayed is the P-frame 13. The 
method 800 then proceeds to step 830. 

At step 830 the decoded frames following, in display order, the last from-stream 
frame (e.g., the B-frame denoted as frame 12 in FIG. 5) are stored in the transition clip. It 
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should be noted that the transition stream or clip will also include frames from the 
to-stream. All of the frames that are stored within the transition clip will then be re-encoded 
to form an encoded transition clip or transition stream. 

FIG. 9 depicts a flow diagram of a method for determining which information 
frames within a to-stream should be included within the transition stream. Specifically, the 
me thod 900 of FIG. 9 is suitable for use in implementing step 715 of the transition stream 
generation method 700 of FIG. 7. 

The method 900 is entered at step 905, where the entry frame of the to-stream is 
identified. The entry frame of the to-stream is the first frame within the to-stream to be 
displayed after a splice point. For example, referring now to the to-stream deptcted in 
FIG. 6. the entry frame (frame 15) comprises a B-frame. The method 900 then proceeds to 
step 910. 

At step 910 the entry frame and all frames appearing before the next I-frame, in 
display order, are decoded. That is, referring to FIG. 6, the entry frame (frame 15) and all 
frames (i.e., frames 16, 17 and 18) appearing before the next I-frame (frame 19) are 
decoded. Since frames 17 and 18 in the to-stream video sequence depicted in FIG. 6 are 
predicted using information from the next I-frame (frame 19). it is necessary to also decode 
the next I-frame. However, the decoded frame 19 may be discarded after frames 17 and 18 
have been decoded. The method 900 then proceeds to step 915. 

At step 915 the next I-frame (e.g.. frame 19 of video sequence 610) is copied to the 
transition clip. That is, the video information within the transport packets forming the 
to-stream (i.e., the video elementary stream information) are extracted from the transport 
packets and copied to the transition clip. It is noted that the output of the encoder is a video 
elementary stream (VES) such that the output from the encoder may be copied directly to 
the transition clip. The transition clip will be subsequently packetized. The method 900 
then proceeds to step 920. 

At step 920 the frames (e.g.. frames 20 through 22) between the next I-frame (e.g., 
frame 25) and the following I-frame (frame 19) are also copied, in transmission order, to the 
transition clip. It must be noted that the frames copied to the transition clip in steps 915 and 
920 (e.g., frames 19-21) are copied to the transition clip as encoded frames. Thus, the 
method 900 adds to the transition clip decoder frames comprising the entry frame and all 
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frames appearing before the next I-frame, and encoded frames comprising the next I-frame 
and all frames between the next I-frame and the following I-frame. 

The from-stream and to-stream frame selection methods described above with 
respect to FIGS. 8 and 9 allow for frame dependencies between the transition stream frames 
5 and those in one or both of the from-stream and to-stream. The following constraints 

should be observed. The transition clip is encoded as a closed GOP structure. That is, the 
transition clip is a self-contained video clip. The transport stream being exited will not 
reference any frames in the transition clip. If the transport stream being entered is coded 
using an open GOP structure, then it may contain frames that reference frames in the 
10 transition clip. 

An important aspect of the invention is the processing of the transition clip to 
appropriately address frame dependencies of frames that are included within the transition 
clip. A frame dependency comprises, e.g„ a predicted frame within the transition clip (i.e., 
a P-framc or B-frame) that must be decoded using an anchor frame from outside of the 
1 5 transition clip. While it is desirable to create a transition clip in which there are no external 
frame dependencies (i.e., a "self contained" clip), the invention is capable of producing an 
MPEG compliant transition clip including such frame dependencies. 
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B. Decoding. 

The second step in the process of constructing a transition clip or transition stream 
comprises the step of decoding the frames selected in the frame selection process. The 
decoding of the selected frames may be effected using standard hardware or software 
decoding techniques. 

It should be noted that, regardless of which frames are to be decoded, decoding must 
begin at an I-frame. As an artifact of the use of prediction in MPEG encoding, every 
non-I-frame is ultimately dependent on the previous I-frame. The above-described frame 
selection methods break these dependencies in order to enable frame accurate, seamless 
splicing between transport streams. 
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C. Encoding. 



The third step in the process of constructing a transition clip or transitton stream 
comprise, the step of encoding the decoded frames resuhing from the frame seiection and 
coding processes. The encoding of me se.ec.ed frames may be effected ustng standard 
hardware or software decoding techniques. 

to addition to breaking frame dependencies (as noted above), one of the primary 
objectives when generating a transition clip is to adjust the VBV .eve.s between the 
ftom-aream and to-stream such ma, a far-end decoder processing me resnlun g splteed 

stream will no, suffer overflow, underflow or other undesirab.e decoder buffer 
memory behavior. For example, if me VBV level a, me exit point of me from-aream ,s 
,ower man the VBV .eve. a, the entry point of me .o-s*eam, men underflow may r^t 
oowns^eam from the splice, to typical decoders this wi„ result in "freeze frames whtle 
n^Twai. for dara to become avaUab.e. A much more serious problem occurs when 
the VBV lev., ,. the exit point of me front-stream is higher titan the VBV !«ve. of the «n«y 
point of the to-*ream. This may msuh in a VBV overflow downs«eam from the sp ite* A„ 
Overflow occurs when more data is avaflab.e man can be buffered. Overflows result m .ost 
and/or corrupted dam and typically oust visual artifacts in me decoded pictures and can 
even cause a decoder to reset. 

After Ute selected frames have been decoded to baseband, they are receded into a 
VES. The inventors used a Samoff Corporation DTV/MPEG-2 Software Encoder to ensure 
high overall performance, picmr. quality and modularity. The rate comro. algonthm .« me 
encoder was modified to allow specification of initial and ending VBV levels, while the 
input module of the encoder was updated to support the output file forma, of me decoder. 
The MPEG encoding parameters that were parsed from .he transport stream during frame 
selection are passed to .he encoder <o ensure <ha. .he receded video is compatible w,th the 
clips being spliced. 

With respec, .o rate control (which ultimately determines overall picture qualhy of 
*e reeded portion of the transition clip), when adjusting me VBV level upwards, the 
select*, frames are coded using fewer bits than the origins, streams. While increasmg *e 
VBV level may result in some loss of quality in me reaming output, due to maskmg m the 
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human visual system, a small degradation in video quality at a scene change is often 
imperceptible to a viewer. The inventors have determined that such visual degradation 
imparted to a stream including a frame accurate, seamless splice does not result in a 
perceptible level of video degradation. 

In one embodiment of the invention, the from-stream and to-stream each comprise 
transport streams having respective video buffering verifier (VBV). The invention 
determines if a difference exists between the from-stream VBV and the to-stream VBV and 
responsively adapts the re-encoding process to such a difference, as necessary. For 
example, the invention may adapt the re-encoding process by increasing a rate control bit 
allocation in response to a determination that the from-stream VBV exceeds the to-stream 
VBV by a first threshold level, and by decreasing the rate control bit allocation in response 
to a determination that the to-stream VBV exceeds the from-stream VBV by a second 
threshold level. 



15 D. Packetizing. 



The fourth step in the process of constructing a transition clip or transition stream 
comprises the step of encoding the decoded frames resulting from the frame selection and 
decoding processes 
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After recoding the selected frames, the I-frame and remaining GOP that were copied 
from the to-stream are appended to the recoded VES. Pending restamping of 
temporal_reference fields, the resulting transition clip comprises a syntactically complete 
MPEG-2 stream (except that is does not have a sequence_end_code) and contains all frames 
in the transition. The final step is to packetize the VES into a transport stream. 

The first step in packetizing the transition stream is to parse the transition stream to 
locate the offsets of the start of each frame (either a sequence_header or a picturejieader) 
and the types of frames within the transition stream. Once this data is available, the 
dependencies between frames are calculated and the frame display order is determined. It 
should be noted that the temporal_reference fields are unsuitable for this purpose since they 
are presently invalid due to GOP restructuring. Once the display order has been 
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detained, the temporal.reference field, arc re-s.an.ped and the presentation (PTS) and 
decode (DTS) time stamps are calculated for each frame in the transition sneam. 

,, should be noted that according to the MPEG-2 standard, temporal discontinuities 
within a transport stream are allowed. However, since some decoders are not ennrely 
compliant with the MPEG-2 standarti, such allowed temporal discontinuities w,Uun a 
u^port stream result in impmper decoder operation. Thus, i, is desirable to remove such 
renrpora. discontinuities within a transport stieam by the use of the re-s«amping process. 

Using the output of the restamping process. PES header are generated and the 
fran.es are output into a PES stream. The location of each PES header and the size of each 
PES packet are recorded during this process. Finally, transport packets are generated to 
hold the PES packets Each layer of packets adds overhead to the TS resulting in a shgh, 
rize increase. The packets in the renting TS are stamped wi* Ore FID of the v,deo stream 
being spliced. The final otrtpu, of the packetizing process is a TS contaimng a smgle VES. 
The stream does not contain any program specific information (PSD. 



E. Remultiplexing. 



The final step in the process of constructing a transition clip or transition swam 
comprises tire step of remultiplexing the video clip (now a transport stieam) wi* program 
specific information (PSD from the original program stream. 

To accomplish the remultiplexing step, the from-stream is examined to extract (as 
uansport packets) a single msunee of tire program association table (PAT) and tire program 
map table (PMT). In the case of splicing single program transport streams there w,.l only 
be one PMT. In the case of splicing multiple program transport streams mere w,U only be 
multiple PMTs. Optionally, to fully implement the ATSC broadcast format, it is necessaty 
to extract other tables as well (as known to those skilled in the art). 

After extracting the PAT and the PMT(s), the number of packets in the transition 
dip is ca.cu.ated based on the multiplex bi, rate, the number of frames in the transition dtp 
and the frame rate. For example, the ATSC specification reunites a PAT at leas, every 
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100ms and a PMT at least every 400ms. The number of packets between PAT and PMT 
tables is determined from the multiplex bit rate. 

After calculating the number of packets in the transition clip, a blank transition clip 
composed of null transport packets is created and the PAT and PMT tables are inserted at 
the calculated spacings (e.g., PAT every lOOmS and PMT every 400mS). 

After appropriately inserting the PAT and PMT(s) in the blank transition stream, the 
video transport stream is inserted into the blank transition stream by spacing packets within 
the remaining available packets, thereby forming an output transport stream. 

It should be noted that when inserting the PAT, PMT and video packets into the 
empty transition clip, each packet should be restamped with a new continuity-counter. The 
starting value of the continuity_counter is determined separately for each PID from the 
exit-stream or from-stream. If the video clip is too large, then there won't be enough 
transport packets in the transition clip, since the size of the transition clip is calculated with 
respect to the expected clip duration. This calculation takes into account the frame count, 
frame rate, VBV delays, multiplex bit rate etc. It is important that VBV adjustment is 
performed properly by the encoder. 

The completed transition clip is then inserted between the spliced transport streams 
at the calculated transport packet offsets, thereby executing a seamless splice. 

The above-described invention advantageously provides for seamless, frame 
accurate splicing or concatenation of transport streams using transition streams of clips, 
thereby avoiding the construction of an entirely new transport stream. The from-and 
to-streams are not modified during the process, since they are only used to provide 
information sufficient to produce the transition stream. The transition stream, after being 
used to effect a change between streams, may be discarded by the system or saved for future 
use. 

The invention has primarily been described within the context of generating a 
transition stream comprising video information suitable for use in providing a seamless 
splice of, illustratively, an MPEG-2 transport stream including a video stream or sub- 
stream. It will be appreciated by those skilled in the art that other forms of information are 
often associated with such video streams. For example, many video streams are associated 
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wim cotresponding audio steams. In addition, other forms of information such as da* 
essence and meta-data may be incorporated into an information stream mc.udtng vtdeo 
Information. Dauessence is data tha, has a context independent oftite video an*oraud.o 
nata within a sn.au, Example of data essence comprise stock quo,a«ions, weather 
advisories and odter news, messages or control information not reiated to the v,deo and/or 
audio data and the like. 

Meta-data is data relating to other data such as data describing characteristic, of a 
video or audio stream. Examp.es of meta-data inch.de video or internet data broadcast 
packets assocated with a video or audio frame, such as alternate camera »g.es. names 
actors in a movie, title of a presentation and the like. 

In the case of audio infonnauon, data essence and/or meta-data associated with 
particular video frames within a video information stream, i, is desirable to ensun , tha, aH 
La associated with the particular frame is available to a receiver of mat vtdeo frame. 
Thus, in tit. ease ofaspUcing application where one or more video streams are 
concatenated to produce a spUced video secant, i. is desirable to ensnre «- *. -- 
information, dala essence and/or metadata associated with video frames uuhzed . the 
Bansition ctip enabling the splice be included within that transition cup. 

HG. 4A comprises a graphical representation of a splicing operation useful in 
understanding the invention. Specifically. FIG. 4A comprises a from-stream 410. deno«d 
as sueam A; a to-sbeam 420 denoted as stream B; and a transition stream or " dtp 
430 denoted as stream T. It should be noted ma, each of sbeams A (4,0). B 42 ), »d T 
(430) are. ...usttatively. MPEG-s nansport sneams comprising video frames (no, shown). 
« J-dau, da» essence and audio d~ These bansport sbeams are formed by mulnplextng 
a plurality of packetized information streams to provide a resulting mfotmabon stream 
mining video, audio and o,her data sbeams. Unfortunately, tit. multip.exing process does 
not appmxima^ly afign audio, da«a essence and meta-dau packed to respective vtdeo 
to es. That is, for each video frame wi,hin the bansport stream, the packets "ng tha, 
video frame may precede or follow (in hi, s«am order) packe,s includmg audto da,a. da,a 
essence or tneUKiata associated with that video frame. Thus, if a transition sbeamts 
fotmed wim tespec, to on.y the video packets fotming an exit or entry frame, the meta-data. 
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data essence and/or audio data associated with the exit or entry frame are likely to be lost or 
incompletely provided to a transition stream. 

Stream A (410) is bounded by a start video frame 410-ST and an ending video frame 
410-END. Stream A comprises a from-stream that will be exited at an exit video frame 
410-OUT. Thus, as discussed above with respect to the transition stream generation 
methods, the plurality of information frames beginning with a transitional video frame 410- 
TRAN and ending with an exit video frame 410-OUT will be decoded for use in forming 
the transition stream. However, the exit video frame 410-OUT is associated with meta-data 
410-MD, data essence 410-DE and audio data 4 10- AD that is located within stream A after 
the exit video frame 410-OUT. It should be noted that such data may also be located before 
the exit video frame 410-OUT. Thus, to incorporate this non-video data into the transition 
stream it is necessary to extract or decode the non-video data. Referring to stream A (410), 
the non-video data associated with the exit frame 410-OUT is bounded by the transition 
frame 410-TRAN and an extent frame 410-EXT defining the maximal boundary (or extent) 
likely to be associated with the non- video data. 

Stream B (420) is bounded by a start video frame 420-ST and an ending video frame 
420-END. Stream B comprises a to-stream that will be entered at entry video frame 420- 
IN. Thus, as discussed above with respect to the transition stream generation methods, the 
plurality of information frames beginning with the entry frame 420-IN and ending with a 
transitional video frame 420-TRAN will be decoded for use in forming the transition stream 
430. However, the entry video frame 420-IN is associated with meta data 420-ND, data 
essence 420-DE and audio data 420- AD that is located within stream B before the entry 
video frame 420-IN. It should be noted that such data may also be located after the entry 
video frame 420-IN. Thus, to incorporate this non- video data into the transition stream 430 
it is necessary to extract or decode the non-video data. Referring to stream B (420), the 
non-video data associated with the entry frame 420-IN is bounded by an extent frame 420- 
EXT and the transition frame 420-TRAN. The extent frame 420-EXT defines the maximal 
boundary (or extent) likely to be associated with the non- video data preceding in bit stream 
order the entry frame 420-IN. 

Thus, to capture all of the video frames appropriate to the transition stream and all 
of the non- video data associated with those video frames the deconstructed portion of 
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stmam A is bonded by 4,0-TRAN and 4,0-EXT. Similarly, the deconstructed portion of 
B is bounded by 420-EXT and 420-EXT and 420-TKAR Afrer decodes and,or - 
Zing the video daU, dau, data essence and audio da, no. streams A and B^e 
Lsition stream 430 is formed in a manner including such data. Thus, transmon stream 

Tr 7. frame accurate splice bertveen the two streams at the appropriate ex« frame 4.0- 

™^JZ ,„e video frame* inc.uded within the transition sueam 430 are a,so .nc.uded 

M irrespective of the splice point That is. non-video data may be mutaplexed wtth 

- a mler preset the assoctation between the non-video and v,deo 

data packets. 

HO 4B comprises a graphica. representation of a splicing operation useful in 
standing the invention. Specific,.* FIG. 4B comprises a 
^ stream 440 and a second mum program Pansport s«am 450. ^ « 
3d second 450 multi program tr^spor, streams comprises a respeenv. p ur« 
^ort sub streams. Reinvention may be utilized » perform frame accurate^amless 
splicing between such mu.ti program transport streams in a manner preaervmg the 
associations between non-video data and the video dam associated wtth «. 

Tmnsport mutiiplex A 440 comprise three uansport sub streams, denoted as 
program , (44,). program , (442) and program (443). Transport MUX B 450 compos 

(453) For purposes of tins discussion it is assumed thai transport MUX B wtllbe 
TncatenaJI transport MUX A at the sub stream .eve,. That is. program 1 44, and 
STa 45 . wi„ I concatenated to form a firs, transport sub stream withm a transttton 
ZTcomprisingap.um.i.yofsubsfre^n, S^y, = ^^ 

ou , frame 44,-OUT whi.e program^. J, be ^ ~ >££? m „ 
program 2 will be exited a. an out frame 442-OUTwhtle program „ 

program 3 wi„ be exited a. an OUT frame 443-OUT ^^"^ 
be enter* « an IM frame 453-IN. The resulting Pansition stream writ comprt* a «n* 
mu,tip.« stream comprising portions of a., six streams inCuding frame accurate seam,«s 
splice points as indicated in FIG. 4B and described above. 
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In addition to video frames, each of the transport sub streams includes non-video 
data such as meta data, data essence and audio data. As indicated in FIG. 4B, each of the 
splice points and the video frames included within the transition stream is associated with 
an extent of such non-video data. Thus, each of the transport MUX sub streams will be 
decoded or otherwise processed to accommodate the extraction of all necessary video and 
non- video data to effect individual transition sub streams. The individual transition sub 
streams are then incorporated into a multi-program transition stream for subsequently 
concatenating the first multi-program stream A (440) and the second multi-program stream 
B (450). 

FIG. 4C depicts a graphical representation of a splicing operation useful in 
understanding the invention. Specifically, FIG. 4C depicts a reservation of non-video 
packet place holders within a transition stream under construction 460. That is, while 
forming a transition stream, it is likely that the step of encoding the decoded video frames, 
from the frames being spliced is performed prior to the step of inserting non- video data into 
the partially formed transition stream. To ensure that the non-video data within the 
transition stream may be located proximate to the video data with which it is associated, 
placeholders are established during the video encoding process to allow for subsequent 
insertion of the non-video data within the transition stream. Specifically, as indicated in 
FIG. 4C, a plurality of audio, data essence and/or meta data place holders are inserted 
within a transition stream under construction. Upon completion of the transition stream, 
those place holders not utilized to store such non-video data are deleted and the resulting 
completed transition stream 460 7 is utilized as the transition stream. 

Within the context of a multi program transport stream such as described above with 
respect to FIG. 4B, each of the transport sub streams being fomied during the transition 
stream generation process utilizes a respective set of non-video data place holders. Each 
stream, upon completion, deletes or otherwise "de-utilizes" or releases the unused place 
holders (e.g., inserting NULL data) to form a completed transition stream. 

The resulting transition stream or transition clip 430 comprises video information 
and non-video information from each of the streams A and B. 

FIG. 12 depicts a flow diagram of a method for generating a transition stream or 
transition clip incorporating pixel domain effects. Specifically, FIG. 12 depicts a flow 
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diagram of a method 1200 suitable for use in the transition clip generation function 344 of 

the Play to air server 1 10 of FIG. 3. 

The method 1200 is entered at step 1210, where a "from-stream" and "to-stream" 

are annotated. A method for annotating an information stream has previously been 

described with respect to FIG. 10. As previously noted, such annotation is not strictly 

necessary to practice the invention. However, the process of annotating, the streams .s 

useful in efficiently processing the streams in subsequent processing steps or by other 

processing opponents. The method 1200 then proceeds to step 1220. 

At step 1220 a portion of the from-stream prior to the exit frame is decoded, such as 

described above with respect to step 710 of the method 700 of FIG. 7. The method 1200 

then proceeds to step 1230. 

At step 1230 a portion of the to-stream beginning with the entry frame is decoded, 

such as described above with tespec. to step 715 of the method 700 of FIG. 7. The meftod 
1 200 then proceeds to step 1240. 

At step 1240 the decoded portions of the from-stream and to-stream are subjected to 
one or more pixel domain processing steps to provide, for example, a special effect or other 
processing effect The special effect provided at step 1240 may comprise one or more of 
the special effects noted in box 1240; namely, morphing, fade, wipe, dissolve, push, reveal, 
black-frame, freeze-frame or other well-known pixel domain processing effects. A 
morphing effect comprises a gradual (e.g., frame by frame) change from one shape into 
another. A wipe effect comprises a changing from one image to another image via mtra- 
image regional changes, such as changing the location of a vertical bar delineating the first 
and second images from, for example, left to right or top to bottom. A fade or dissolve 
effect comprises a gradual fading or dissolving of a first image to reveal an second image 
underlying the first image. The underlying image may fade may also emerge in an manner 
opposite to the fading first image. A black (or blue) frame effect comprises the insertion of 
a monochrome frame(s) between two images. A "push" effect is in effect wherein an old 
image appears to slide off the screen as if it were being pushed by a new image sliding onto 
the screen. The old image and new image may be slid in any direction to produce this 
effect A "reveal" effect is where an old image is removed to reveal an underlying new 
image. A reveal effect may comprise a "peal back" effect in which a "turned up comer," or 
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a graphical representation of a turned up comer, reveals a portion of a new image 
underlying the old image. Upon selection of the new image, the old image is pealed back or 
otherwise removed from view beginning with the turned up comer portion to reveal the 
underlying new image. 

A non-pixel domain effect for the meta-data domain may comprise a closed caption 
change at a sentence boundary. A non-pixel domain effect for the audio domain may 
comprise an audio fade from stream A audio, through silence, and back to audio 
information associated with stream B to form the spliced information stream. 

The pixel domain processing step(s) may be used to provide artistic or interesting 
means of transitioning between video clips. For example, a caveat effect may be 
implemented in a 6 frame transition clip by transitioning from frame one to frame six via 
the four intervening frames including portions of frames one and six. While it is desirable 
to ensure that the pixel domain processing in part some form of transitional information to a 
viewer, such imparting of transitional information is not necessary. The method 1200 then 
proceeds to step 1250. 

In one embodiment of the invention, the pixel domain process is performed with 
respect to a plurality of transport streams or other streams. Specifically, it is noted that the 
invention has been described above primarily within the context of two transport streams 
including at least image information being concatenated to produce a spliced transport 
stream including at least image information. During the generation of the transition stream 
or transition clip, the image information within the respective transport streams is decoded 
such that pixel domain information is available for processing by a pixel domain process. 
In one embodiment of the invention, additional pixel domain (or non-pixel domain) 
information is used during the pixel domain or non-pixel domain processing step. In a 
chroma-key processing example, a transport stream including a chroma-keying signal, 
herein denoted as a K-stream, includes video information having one or more chroma-keyed 
image regions. A first keyed image region within the K-stream may be indicated by a first 
color, while a second keyed image region of the K-stream may be indicated by a second 
color. The pixel domain information within the transition clip associated with the first 
keyed region is replaced by information from a first information source or information 
stream, while the pixel domain information within the transition clip associated with the 
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second keyed region is replaced by infonnation from a second information source or 
infonnation stream. Thus, in the case of stream A comprising a K-stream havtng 
"ted to i.astreamB.o form a —n s«am. W o addition., information s U eams 
are used (denoted as region s*eam one and region sneam two) to provtde tmage 
information to replace the Bra, and second keyed regions, respecnvely. of the K-sfrean u It 
2 he appreciated hy those skilled in ft. an fta, any number of regions may be u.,h«d 
and mat non-pixel infonnation may also be divided into regtons. 

A. step 1250, the decoded and processed video frames are re-encoded to form a 

nation stream. Step .250 may be implemented in subs.an.iaUy the same manner as 

described above with respect to step 720 of me method 700 of FIG. 7. 

-rhus.ftem.«hodl200ofnG.12provides.inaddihon.omegenera t ionofa 

s,ream or nation dip, ft. adapution of video informal wifttn mat transmon 

known pixel domain processing techniques may be used » impar, a more reabstic 
nunsi.il. impression to a viewer as the from-stream is exited and fte 
U shouw be noted processing in non-video domains may ..so be perfonned on Ute non- 
video data discussed above with respect to FIG. 4A-4C. 

Thus, me utility of me pre«n. invention ex.«nds beyond Ute bare notion of pixel or 
nnage domain processing of omy two image steams. Rafter, ft. subject inventto„ finds 
3 applicant where a p.ura,i,y of information streams may be used .o pmcess p.xe, 
domain or ofter or non-video domain information wiftin a nunsition s.r.am bemg 
generated. In this manner, a option stream or transition dip may be gener^ «r 
response .o many sources of infonnation such fta. video and non-v,deo ■"""^ 
n^ed wift video and/or non-video information from mom than .he two steams fonrnng a 

transition clip. 

1, should be noted that a ttansition clip or stream may be fotmed wift a 
predetermined number of video frames. As such, in addition to fte P"*"**^ 
VBV processing opportunities, fte predefined number of frames may be used » effect, 
p^pixe. dom^neffectby selective encoding of portions of ^^^^ 

transition dip ,o have five video frame, each of fte five frames maybe dtvtded ,n»_ m 
inna-frame regions. The firs, frame includes ./6 video data from fte to-stieam and 5/6 data 
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from the from-stream; the second frame includes 2/6 data from the from-stream and 4/6 
data from the to-stream and so on up to the fifth frame, which includes 1/6 data from the 



user-selectable (or predetermined) numbers of frames between 3 and 25 frames in a 
transition stream provides sufficient flexibility to enable most pixel domain processes and 
VBV buffer normalization functions. 

FIG. 13 depicts a flow diagram of a method for generating a transition stream or 
transition clip according to an embodiment of the invention. Specifically, FIG. 13 depicts a 
flow diagram of a method 1300 suitable for use in the transition clip generation function 
344 of the Play to air server 1 10 of FIG. 3. 

The method 1300 is entered at step 1310 where an appropriate portion of the from- 
stream video prior to an exit frame is decoded. The method 1300 then proceeds to step 
1320. 

At step 1320, non- video information such as data essence, audio, rneta-data and/or 
other data within the from-stream that is associated with the decoded video portion is 
extracted or decoded. That is, auxiliary or ancillary data, such as the aforementioned non- 
video data types, that are associated with the video frames within the from-stream decoded 
at step 1310 are extracted or decoded for subsequent use in the transition stream or 
transition clip. 

At step 1330, an appropriate portion of the to-stream video beginning with an entry 
frame is decoded. The method 1300 then proceeds to step 1320. 

At step 1340, non-video data associated with the video frames decoded at step 1330 
is extracted or decoded. That is, data essence, audio, meta-data, and/or other data within the 
to-stream associated with the video frames decoded at step 1330 is extracted or decoded for 
subsequent use in the transition stream or transition clip. The method 1300 then proceeds to 
optional step 1350. 

Step 1350, an optional processing step suitable for use on a partially formed 
transition stream or transition clip. Specifically, optional step 1350 includes three optional 
sub-steps which may be utilized independently or in any combination to effect a processing 
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of *e video data decoded a, steps ,3,0 and ,330 orrhe non-video data exited or 
decoded at steps 1320 and 1340. 

A first optional sub step 1352 within optional step 1 350 comprises the performance 

• n9 ftf the decoded video data. That is, any of at least the pixel 

ofany pixel domain processing of the decoaeaviaeo 

u • ^ri W above with respect to steps 1 240 and box 1 245 ot 
domain processing techniques described above wim r«» P r , 

HG ,2 m.v be used to process the — - video .forma - decoded 

^eps >3,0 and ,330 respective* The method ,300 men proceeds to step ,354. 

M ^ond optiona, sub-step ,354 or step ,350. any audio domain processing of me 
„ tr J or decoded audio data from steps ,320 and,or , 340 is performed. Such. , 
plesstng may inciudc any of me ta own audio domain processing 
Lparti for examp,e. a feeUng of transition, or other audio impact upon a hstener. The 
method 1300 then proceeds to step 1356. 

At rhird optiona, sub-srep ,356 of srep ,350 any data domain processing of 
ratract ed or decoded da* essence, meu-data or orher dara ma, was extracted 
_ 1320 and/or ,340 is performed. Such dara processing may mdude. for examp.e. 

2*— » - *- — - wd upon the pixel domai : riL. of 

pirmed a, aep ,35. For examp.e, if Ore meu da. describes '^Z^ 
a tradition dip video frame subjecred to pixe, domain process** then the me.a-da.at 
Zsed ro rloec, the corresponding pixe, domam processrng. Chords, p— 
Lotions maybe nnp.ememed as we,,. The merhod ,300 then proceeds to step ,360. 

A, srep ,360 me decoded and. optional,,, processed video portions of the transit™ 

m «a-d*a, and/or outer data, inc,uding non-video dau processed a, step ,352-1 56amre 
Lded according to the appropriate formau or inserted depending upon the da atyp . 
L is dre optionaily process* video and non-video information produced by steps 3,0 

transition stream. 

to an embodiment of the invention described above with respect to FIG. 4C. the 
virion *ream to be formed comprises a transport stream or other 
» p,ura,i.y of paCcets are used to represent the video and non-video dau. In tins ^odunen. 
of the i ventio, prior to forming a transition stream or transition dip, some portion of 
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available packets utilized to hold information are reserved for non-video data purposes. In 
this manner, the video information may be processed prior to the processing of any non- 
video information such that data place holders proximate the video frames may be 
interspersed among the video frames to include data relevant to those proximate video 
5 frames. Thus, in this embodiment of the invention an optional step 1350 is used to prior to 
step 1310 and the method I300ofFIG. 13. Specifically, at step 1350 data place holders are 
included in the transition stream to be formed. That is, at step 1350 a portion of memory or 
plurality of packets intended to be used for the transition stream are interspersed with place 
holder information defining packets for non-video use. The method 1300 then proceeds 
10 through step 1310 to step 1360. 

Step 1360, per box 1365, utilizes the appropriate place holders to store non-video 
information such as optionally processed audio, meta-data, data essence and/or other data 
related to the video frames. Upon completing the transition clip or upon processing all non- 
video information and locating such processed non-video information within appropriate 
place holders, unused place holders are removed or otherwise utilized for other purposes. 

As previously noted, additional processing of the transition clip is used to ensure 
that the VBV of the from-and to-streams are accommodated in a manner providing for a 
substantially seamless splicing operation. 

The invention has been primarily described within the context of splicing or 
concatenating two single program transport streams, i.e., transport streams containing a 
single audio-visual program, such as a movie, television show or commercial. However, 
those skilled in the art will appreciate that the invention provides frame accurate, seamless 
splicing between multi-program transport streams as well. To effect such a splice, the 
above-described methods are adapted to determine out-frames, in-frames and other 
appropriate parameters for each program within the multi-program transport streams. 

Although various embodiments which incorporate the teachings of the present 
invention have been shown and described in detail herein, those skilled in the art can readily 
devise many other varied embodiments that still incorporate these teachings. 
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What is claimed is: 

1 In a system for processing transport streams including image frames, a method for 
generating a transition stream for transitioning from a firs, tmnsport srream to a second 
transport stream in a substantiafiy seamless manner, said m.<hod comprising the steps of: 

decoding (1220) a portion of said first transport sneam including a. leas, a .arge. 
om-ftame representing a las. image frame of said firs, transport stream to be pres«.«ed; 

decoding (1 230) a portion of said second transport srieam including a, least a target 
in-frame representing a fust image frame of said second transport s«am to be presented; 

processing (1240). using apixel domain process (1245). a. teas, one of said decoded 
image frames; and 

encoding (1250) a pfurality of said decoded image frames, including said target out- 
ftame and said target in-frame, to produce said transition stream. 



2 



, The method of claim 1 , wherein said pixel domain process comprises a. leas, one of 
a morph. fade, wipe, dissolve, push, reveal. bUck-frame. freeze-fiame and chroma-keytng 
pixel domain process. 

3. The method of claim 1, further comprising the steps of: 

extracting (1320, 1340), from said first and second transport streams, non-video data 
associated with said video frames used to form said transition stream; and 

inserting (1360), into said transition stream, said extracted non-video data. 



4. The method of claim 3, wherein said non-video data comprises at least one of audio 
data, mcta-data, data essence, ancillary data and auxiliary data. 



5. The method of claim 3, further comprising the step of: 
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processing (1350), using an non-video domain process, at least a portion of said 
extracted non-video data. 



6. The method of claim 4, wherein said step of encoding said plurality of decoded 
image includes the step of transport encoded said encoded plurality of image frames, said 
method further comprising the steps of: 

reserving (1315) a plurality of transport packets within said transition stream, said 
reserved packets not being utilized to store encoded image information; 

utilizing (1365) at least a portion of said reserved plurality of transport packets to 
store said extracted non- video data. 
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7. The method of claim 3, wherein said first transport stream and said second transport 
stream are multiplexed into respective first and second multiple program transport streams, 
said method further comprising the step of: 

determining, for each multiple program transport stream including a transport stream 
to be processed, a maximum extent of all image frames to be included in a transition stream; 



and 



demultiplexing each multiple program transport stream to accommodate its 
respective determined maximum extent. 



8. The method of claim 7, wherein said step of determining said image data extent 
includes the step of determining a maximum extent of all non-video data associated with 
image frames to be included in a transition stream, said maximum extent comprising a 
combination of the image data extent and the non-video data extent. 



9. The method of claim 1 , further comprising the step of indexing each of said first and 
second transport streams, said step of indexing comprising the steps of: 
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parsing (1010) a transport layer of a stream to be indexed to identify packets 
associated with at least one of sequence headers, picture headers and predefined sphcmg 
syntax; 

determining (1020), for each frame in said stream to be indexed, at least one of a 
5 picture number, a picture coding type, a start of frame transport packet number, an end of 
frame transport packet number, a presentation time stamp (PTS) and a decode time stamp 
(DTS). 

10 The method of claim 1 . wherein said from-stream and said to-stream each comprise 
10 a transport stream having associated with it a respective video buffering verifier (VBV) 
parameter, said method further comprising the step of: 

determining if a difference exists between said from-stream VBV parameter and 
said to-stream VBV parameter, and 

adapting, in response to said determination, step of re-encoding. 



11. The method of claim 10, wherein said step of adapting comprises the steps of: 

increasing a rate control bit allocation in response to a determination that said from- 
stream VBV parameter exceeds said to-stream VBV parameter by a first threshold level; 
and 

decreasing said rate control bit allocation in response to a determination that said to- 
stream VBV parameter exceeds said from-stream VBV parameter by a second threshold 
level. 
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